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The paper considers functional linear regression, where scalar re- 
sponses Yi, . . . ,Yri are modeled in dependence of random functions 
Xi , . . . , Xn . We propose a smoothing splines estimator for the func- 
tional slope parameter based on a slight modification of the usual 
penalty. Theoretical analysis concentrates on the error in an out-of- 
sample prediction of the response for a new random function Xn+i- 
It is shown that rates of convergence of the prediction error depend 
on the smoothness of the slope function and on the structure of the 
predictors. We then prove that these rates are optimal in the sense 
that they are minimax over large classes of possible slope functions 
and distributions of the predictive curves. For the case of models 
with errors-in- variables the smoothing spline estimator is modified by 
using a denoising correction of the covariance matrix of discretized 
curves. The methodology is then applied to a real case study where 
the aim is to predict the maximum of the concentration of ozone by 
using the curve of this concentration measured the preceding day. 

1. Introduction. In a number of important applications tiie outcome of 
a response variable Y depends on the variation of an explanatory variable X 
over time (or age, etc.). An example is the application motivating our study: 
the data consist in repeated measurements of pollutant indicators in the area 
of Toulouse over the course of a day that are used to explain the maximum 
(peak) of pollution for the next day. Generally, a linear regression model 
linking observations 1^ of a response variable with p repeated measures of 
an explanatory variable may be written in the form 

1 ^ 

(1.1) Yi = ao + -'^ajXi{tj) + e*, i = l,...,n. 

Pj=i 
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Here ti < ■ ■ ■ < tp denote observation points which are assumed to belong 
to a compact interval I C M. The possibly varying strength of the influence 
of Xi at each measurement point tj is quantified by different coefficients 
aj . Frequently n and /or there is a high degree of collinearity between 
the "predictors" Xi{tj),j = l,...,p, and standard regression methods are 
not applicable. In addition, (1.1) may incorporate a discretization error, 
since one will often have to assume that Yi also depends on unobserved 
time points t in between the observation times tj. As pointed out by sev- 
eral authors (Marx and Eilers [22], Ramsay and Silverman [26] or Cuevas, 
Febrero and Fraiman [10]) the use of functional models for these settings 
has some advantages over discrete, multivariate approaches. Only in a func- 
tional framework is it possible to profit from qualitative assumptions like 
smoothness of underlying curves. Assuming square integrable functions Xi 
on / C M, the basic object of our study is a functional linear regression model 

(1.2) Yi = ao + J^a{t)X^{t)dt + ei, i = l,...,n, 

where e^'s are i.i.d. centered random errors, E(ej) = 0, with variance IE(ef ) = 
o"^, and a is a square integrable functional parameter defined on / that 
must be estimated from the pairs {Xi,Yi),i = 1, . . . , n. This type of regres- 
sion model was first considered in Ramsay and Dalzell [24]. Obviously, (1.2) 
constitutes a continuous version of (1.1), and both models are linked by 

r 1 ^ 

(1.3) e* =di + Ei, where di = / a(t)Xi(t) dt - -Y^ aitAXAtA 

J I 

may be interpreted as a discretization error, and a{tj) = aj. 

As a consequence of developments of modern technology, data that may 
be described by functional regression models can be found in a lot of fields 
such as medicine, linguistics, chemometrics (see, e.g., Ramsay and Silver- 
man [25, 26] and Ferraty and Vieu [14], for several case studies). Similarly 
to traditional regression problems, model (1.2) may arise under different ex- 
perimental designs. We assume a random design of the explanatory curves, 
where Xi , . . . , Xn is a sequence of identically distributed random functions 
with the same distribution as a generic X. The main assumption on X is that 
it is a second-order variable, that is, K(Jj X'^{t) dt) < -|-oo, and it is assumed 
moreover that E,{Xi{t)ei) = for almost every t £ I. This situation has been 
considered, for instance, in Cardot, Ferraty and Sarda [7] and Miiller and 
Stadtmiiller [23] for independent variables, while correlated functional vari- 
ables are studied in Bosq [2]. Our analysis is based on a general framework 
without any assumption of independence of the Xj's. We will, however, as- 
sume independence between the Xj's and the e^'s in our theoretical results 
in Sections 3 and 4. 
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The main problem in functional linear regression is to derive an estima- 
tor a of the unknown slope function a. However, estimation of a in (1.2) 
belongs to the class of ill-posed inverse problems. Writing (1.2) for generic 
variables X, Y and e, multiplying both sides hy X — ¥,{X) and then taking 
expectations leads to 

K{{Y -K{Y)){X -K{X))) 

(1.4) 

= E(^J^a{t){X{t) -E{X){t))dt{X -E{X))^ =:T{a). 

The normal equation (1.4) is the continuous equivalent of normal equa- 
tions in the multivariate linear model. Estimation of a is thus linked with 
the inversion of the covariance operator T of X defined in (1.4). But, un- 
like the finite dimensional case, a bounded inverse for T does not exist 
since it is a compact linear operator defined on the infinite dimensional 
space L?'{I)- This corresponds to the setup of ill-posed inverse problems 
(with the additional difficulty that T is unknown). As a consequence, the 
parameter a in (1.2) is not identifiable without additional constraint. Ac- 
tually, a necessary and sufficient condition under which a unique solution 
for (1.2)-(1.4) exists in the orthogonal space of ker(r) and is given by 

E({y-E(y))/ (x(j)-E(x)(t))c.Wdi) 2 ^ m a ^ ■ i 

2^,,( — ^ j < -|-oo, where (A^, Cr jr are the eigeneie- 

ments of T (see Cardot, Ferraty and Sarda [7] or He, Miiller and Wang [19] 

for a functional response). The set of solutions is the set of functions a which 

can be decomposed as a sum of the unique element of the orthogonal space 

of ker(r) satisfying (1.4) and any element of ker(r). 

It follows from these arguments that any sensible procedure for estimat- 
ing a (or, more precisely, of its identifiable part) has to involve regular- 
ization procedures. Several authors have proposed estimation procedures 
where regularization is obtained in two main ways. The first one is based on 
the Karhunen-Loeve expansion of X and leads to regression on functional 
principal components: see Bosq [2], Cardot, Mas and Sarda [8] or Miiller 
and Stadtmiiller [23]. It consists in projecting the observations on a finite 
dimensional space spanned by eigenfunctions of the (empirical) covariance 
operator Tn- For the second method, regularization is obtained through a 
penalized least squares approach after expanding a in some basis (such as 
splines): see Ramsay and Dalzell [24], Filers and Marx [12], Cardot, Ferraty 
and Sarda [7] or Li and Hsing [21]. We propose here to use a smoothing 
splines approach prolonging a previous work from Cardot et al. [5]. 

Our estimator is described in Section 2. Note that (1.2) implies that 
Yi — Y = J J a{t)[Xi{t) — X{t)\ dt + Ei — e. Based on the observation times 
ti < • • • < tp, we rely on minimizing the residual sum of squares X]i(^ ~ 
y — ^ Z]j=i 0'{'tj){^i{'tj) ~ ^{tj)))'^ subject to a roughness penalty. A slight 
modification of the usual penalty term is applied in order to guarantee 
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the existence of the estimator under general conditions. The proposed es- 
timator a is then a natural spline with knots at the observation points tj. 
An estimator of the intercept ag = E{Y) — Jj a{t)K{X){t)]dt is given by 
oq = Y — Jj a{t)X{t) dt. For simplicity, we will assume that ti < • • • < tp are 
equispaced, but the methodology can easily be generalized to other situa- 
tions. It must be emphasized, however, that our study does not cover the 
case of sparse points for which other techniques have to be envisaged; for 
this specific problem, see the work from Yao, Miiller and Wang [32]. 

In Section 3 we present a detailed asymptotic theory of the behavior of 
our estimator for large values of n and p. The distance between a and a is 
evaluated with respect to semi- norms induced by the operator T, = 
{Tu,u) with {u,v) = Jju{t)v{t) dt, or its discretized or empirical versions 
(see, e.g., Cardot, Ferraty and Sarda [7] or Miiller and Stadtmiiller [23] 
for similar setups). By using these semi-norms we explicitly concentrate on 
analyzing the estimation error only for the identifiable part of the structure 
of a which is relevant for prediction. Indeed, it will be shown in Section 3 
that ||q — a||p determines the rate of convergence of the error in predicting 
the conditional mean + Jj a{t)Xn-\-i{t) dt of l^+i for any new random 
function Xn+i possessing the same distribution as X and independent of 
Xi,. . . , Xn ■ 



We first derived optimal rates of convergence with respect to the semi- 
norms induced by F in a quite general setting which substantially improved 
existing results in the literature as well as bounds obtained for this estima- 
tor in a previous paper (see Cardot et al. [5]). If a is m-times continuously 
differentiable, then it is shown that rates of convergence for our estimator 
are of order 7i-(2™+2ij+i)/(2m+2g+2) ^ where the value of g > depends on 
the structure of the distribution of X. More precisely, q quantifies the rate 
of decrease J2'^k+i = 0{k~'^'^) as k ^ oo, where Ai > A2 > • • • are the 
eigenvalues of the covariance operator F. If, for example, X is a.s. twice 
continuously differentiable, then g > 2. As a second step, we show that these 
rates of convergence are optimal in the sense that they are minimax over 
large classes of distributions of X and of functions a. No alternative esti- 
mator can globally achieve faster rates of convergence in these classes. 

In an interesting paper Cai and Hall [4] derive rates of convergence on 
the error + {a,x) — ao — {a, x) for a pre-specified, fixed function x. Their 
approach is based on regression with respect to functional principal compo- 
nents and the derived rates are shown to be optimal with respect to this 
methodology. At first glance this setup seem to be close, but due to the 



(1.5) 




a — a||p + Op{n 
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fact that explanatory variables are of infinite dimension, inference on fixed 
functions x cannot generally be used to derive optimal rates of convergence 
of the prediction error (1.5) for random functions Xn+i- We also want to 
emphasize that in the present paper we do not consider the convergence of a 
with respect to the usual L norm. Analyzing ||S — a 

||2 = jj{a{t)-a{t)fdt 

instead of ||a — a||p must be seen statistically as a very different problem, 
and under our general assumptions it only follows that ||S — a|p is bounded 
in probability (see the proof of Theorem 2). It appears that to get stronger 
results one needs additional conditions linking the "smoothness" of a and 
of the curves Xi as derived in a recent work by Hall and Horowitz [18]. A 
detailed discussion of these issues is given in Section 3.2. 

In practice the functional values Xi{tj) are often not directly observed; 
there exist only noisy observations Wij = Xi(tj) + 5ij contaminated with 
random errors 6ij. In Section 4, we consider a modified functional linear 
model adapting to such situations. In this errors-in- variable context, we use 
a corrected estimator as introduced in Cardot et al. [5] which can be seen as 
a modified version of the so-called total least squares method for functional 
data. We show again the good asymptotic performance of the method for a 
sufficiently dense grid of discretization points. 

We devote Section 5 to the application of the proposed estimation proce- 
dure to the prediction of the peak of pollution from the curve of pollutant 
indicators collected the preceding day. Finally, the proofs of our results can 
be found in Section 6. 

2. Smoothing splines estimation of the functional coefficient. As ex- 
plained in the Introduction, we will assume that the functions Xi are ob- 
served at p equidistant points ti, . . . ,tp £ I . In order to simplify further 
developments, we will take I = [0, 1] so that ii = ^ and tj — tj-i = | for all 
j = 2,...,p. 

Our estimator of a in (1.2) is a generalization of the well-known smoothing 
splines estimator in univariate nonparametric regression. It relies on the 
implicit assumption that the underlying function a is sufficiently smooth 
as, for example, m-times continuously differentiable (m = 1,2,3, . . .). 

For any smooth function a the discrete sum ^J2^=i(^{'tj)Xi{tj) is used 

to approximate the integral Jq a{t)Xi{t) dt in (1.2), whereas expectations 
are estimated by the sample means Y and X, and an estimate is obtained 
by minimizing the sum of squared residuals (Yi — Y — ^ Sj=i o-{tj){Xi{tj) — 

'X{tj))f subject to a roughness penalty. More precisely, for some m = 1, 2, . . . 
and a smoothing parameter p > 0, an estimate a is determined by minimiz- 
ing 

-Y^(Y-Y--Y: amx.it,) - X{t,)) 

i=l \ ^ 1=1 / 
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(2.1) ^ 



over all functions a in the Sobolev space ^^'"'^([0, 1]) C L^([0,1]), where 
vra(i) = EI^i Pa,lt'-' With E%Mtj) - ^ait,)? = min^,,„.,^„ E%Mtj) - 

Obviously, tt^ denotes the best possible approximation of (a(ii), . . . ,a{tp)) 
by a polynomial of degree m—1. The extra term ^ J^j=i ^a(ij)^ in the rough- 
ness penalty is unusual and does not appear in traditional smoothing splines 
approaches. It will, however, be shown below that this term is necessary to 
guarantee existence of a unique solution in a general context without any 
additional assumptions on the curves Xi. 

It is quite easily seen that any solution a of (2.1) has to be an el- 
ement of the space NS^{ti, . . . ,tp) of natural splines of order 2m with 
knots at ti,...,tp. Recall that NS^''{ti, . . . ,tp) is a p-dimensional linear 
space of functions with v^"^^ G L'^{[0,1]) for any v G NS'^{ti, . . . ,tp). Let 
h{t) = {bi{t), . . . ,bp{t)y be a functional basis of NS"^{ti, . . . ,tp). A discus- 
sion of several possible basis function expansions can be found in Eubank 
[13]. An important property of natural splines is that there exists a canonical 
one-to-one mapping between MP and the space iVS""(ti, . . . ,tp) in the follow- 
ing way: for any vector w = (wi, . . . , WpY G MP, there exists a unique natural 
spline interpolant Sw with Sw(ij) = Wj, j = 1, . . . ,p. With B denoting the 
p X p matrix with elements bi(tj), Sw is given by 

(2.2) Sw(t) =b(t)^(B^B)-iB^w. 

The important property of such a spline interpolant is the fact that 



(2.3) f\'^'Htfdt< /'/^™^(i) 

JO Jo 



for any other function / G VF'"'^([0, 1]) 
with f{tj) = Wj,j = l,...,p. 

Note that in (2.1) only the integral a^'^\t)'^ dt depends on the values 
of a in the open intervals {tj-i,tj) between grid points. It therefore follows 
from (2.3) that S = s^, where a = (a{ti), . . . , a{tp)y G M^ minimizes 





2 
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with respect to all vectors a = (a(ti), . . . ,a{tp)Y £ W. 

A closer study of S requires the use of matrix notation: Y = (Yi — 
Y, . . . ,y„-F)^ X, = iXi{h)-X{ti),. . .,Xi{tp) -X{tp)r for ah i = 1, . . . ,n, 
OL = (Q;(ti), . . . , a{tp)Y , £ = (ei — e, . . . , En — cY and let X be the nxp matrix 
with a general term Xi{tj) — X{tj) for all i = l,...,n, 
j = l,...,p. Moreover, will denote the p x p projection 

matrix projecting into the m-dimensional linear space Em ■= {w = {wi, . . . , 
WpY £ W\wj = j = of all (discretized) polynomials 

of degree m - 1. By (2.2), we have /q^ si™''(t)^ dt = a'^A;^a, where A;^ = 
B(B^B)-^ [/o^ b^'") (t)b(™) {tY dt] (B^B)^^B^ is a p x p matrix. 

When defining A.^ := P^+pA^, minimizing (2.4) is equivalent to solving 

H a A^a > , 
P J 

where || • || stands for the usual Euclidean norm. The solution is given by 

(2.6) a = — f^X^X + ^ A„) ^X"Y = -f— X"X + pA„) ^X"Y. 

np xnp"^ P / n\np / 

Then 3 = constitutes our final estimator of a while OiQ = Y — {a,X) is 
used to estimate the intercept oq. Based on a somewhat different develop- 
ment, this estimator of a has already been proposed by Cardot et al. [5]. 

In order to verify existence of a, let us first cite some properties of the 
eigenvalues of pA,^ which have been studied by many authors (see Eubank 
[13]). For instance, in Utreras [28], it is shown that this matrix has exactly 
m zero eigenvalues ^i^p = ■ ■ ■ = /x^.p = 0. The corresponding m-dimensional 
eigenspace is the space Em of discretized polynomials as defined above. The 
p — m nonzero eigenvalues < ^m+i,p < ■ ■ < lJ'p,p are such that there exist 
constants < Dq < Di < oo such that Dq < fij^m.^p^nj)''^"^ < Di for j = 
1, . . . ,p — m and all sufficiently large p. Therefore, there exist some constant 
< Co < -l-oo and some po £ {0, 1,2,.. .} such that for all p>po and k = 
0, . . . ,p — m — 1 

(2.7) i <Co. 

We can conclude that all eigenvalues of the matrix Am are strictly positive, 
and existence as well as uniqueness of the solution (2.6) of the minimization 
problem (2.5) are straightforward consequences. Note that Introduction of 
the additional term ^YTj=i'^a{ij)'^ in (2-1) is crucial. Dropping this term 
in (2.1) as well as (2.4) results in replacing Am by pAm in (2.5). Existence 
of a solution then cannot be guaranteed in a general context since, due to 
the m zero eigenvalues of pA*^, the matrix (^X'^X + pA*^) may not be 
invertible. 



(2.5) 
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Remark. Our requirement of equidistant grid points tj has to be seen 
as a restrictive condition. There are many apphcations where the functions 
Xi are only observed at varying numbers pi of irregularly spaced points 
til < • ■ • < tipi ■ Then our estimation procedure is not directly applicable. 
Fortunately there exists a fairly simple modification. Define a smooth func- 
tion Xi e L^([0,1]) by smoothly interpolating the observations (e.g., using 
natural splines) such that Xi{tij) = Xi{tij), j = 1, . . . ,pi. Then define p > 
max{pi, . . . ,pn} equidistant grid points ti, . . . , tp, and determine an estima- 
tor a by applying the smoothing spline procedure (2.1) with ^ J2^=i o.{tj){Xi{tj) — 

X{tj)) being replaced by ^J2^=i(^{'tj){Xi{tj) — X(tj)). For example, in the 
case of a random design with i.i.d. observations tij from a strictly positive 
design density on /, it may be shown that the asymptotic results of Section 
3 generalize to this situation if minjpi, . . . , pn} is sufficiently large compared 
to n. A detailed analysis is not in the scope of the present paper. 

3. Theoretical results. 

3.1. Rates of convergence for smoothing splines estimators. We will de- 
note the standard inner product of the Hilbert space L^([0, 1]) by {f,g) = 
lo f{t)9{t) dt and || • || by its associated norm. As outlined in the Introduction, 
our analysis is based on evaluating the error between a and a with respect 
to the semi- norm || • ||r defined in Section 1, 

||n||^ := (rn,n), ^£^^([0,1]), 

where F is the covariance operator of X given by 

Tu := E{{{X - E{X)),u){X - E{X))), u G L^{[Q, 1]). 

The above LP' semi-norm has already been used in similar contexts as 
the one studied in the present paper; see, for example, Wahba [30], Cardot, 
Ferraty and Sarda [7] or Miiller and Stadtmiiller [23]. By (1.5) the asymp- 
totic behavior of ||5 — a||p constitutes a major object of interest, since it 
quantifies the leading term in the expected squared prediction error for a 
new random function Xn+i- 

As first steps, we will consider in Theorems 1 and 2 the error between 
a and a with respect to simplified versions of the above semi-norm: the 
discretized empirical semi-norm defined for any u G as 

||u||2 :=lu-C^X-x)u, 

p \np J 

and the empirical semi- norm defined for any u G ^^([0, 1]) as 

1 " — 
Mk ■■=-Y.^{Xi - X),uf = {Tnu,u), 

1 = 1 
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where r„ is the empirical covariance operator from Xi, . . . ,Xn given by 

1 " _ _ 



n 



Obviously, ||5 - = iEjjE?=i(S(i,0_- a{t,)){X,{tj) - X{t,))]^ 

and \\a - a\\l^ = ^J2i[Iiiait) - a(t)){Xi{t) - X{t))dt]^ quantify different 
modes of convergence of {a,X — X) to (a, {X — X)) . 

As mentioned in Section 2, the function a is required to have a certain 
degree of regularity. Namely, it satisfies the following assumption for some 
me{l,2,...}: 

(A.l) a is m-times differentiable and a^"^) belongs to L^([0,1]). 

Let Ci = Jq a^"'\t)'^ dt and C| = J^a{t)'^dt. By construction of P^, PmCt 
provides the best approximation (in a least squares sense) of a by (dis- 
cretized) polynomials of degree m — 1, and ^Q.'^Pmf^ ^ ^ot^-^m.f^ — > as 
p — > oo. Let C2 denote an arbitrary constant with < C2 < 00. There then 
exists a pi G {0, 1, . . .} with pi > po such that ict'^PmO! < C2 for all p>pi. 

Recall that our basic setup implies that Ai , . . . , Xn are identically dis- 
tributed random functions with the same distribution as a generic variable 
X. Expected values lEe(-) as stated in the theorems below will refer to the 
probability distribution induced by the random variable e, that is, they stand 
for conditional expectation given Xi, . . . , A„. We assume moreover that £i 
is independent of the Aj's. In the following, for any real positive number x, 
[x] will denote the smallest integer which is larger than x. In addition, let 
^x,i ^ ^x,2 > • • • > ^x,p ^ denote the eigenvalues of the matrix ^X'^X. We 
start with a theorem giving finite sample bounds for bias and variance of 
the estimator a with respect to the semi-norm || • ||r„,p- 

Theorem 1. Under assumption (A.l) and the above definitions of Cq, 
Ci, C2, Pi, the following bounds hold for all n = 0,1, ... , all p > pi, all 
p > n"^'" and every n x p matrix X = {Xi{tj))ij : 



/I \ 4 _ 

E,{a)-a\\l^^<2p[-a^Praa + Cl)+-Y,{d^-d) 



(3.1) 



^ 1=1 



. n 

<p{C2 + Ci) + -Y,{di-df, 



as well as 

(3.2) E,(||a - E,(a) ||2 ) < ^ (m + [^^1/(2™ Wi)](2 + c • Co 

for any C > and q>0 with the property that J2^=k+i — ^ ' ^"^"^ holds 
/orA;:=[p-i/(2'"+29+i)]. 
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The rate of convergence of ||S — Q;||p^^ thus depends on assumptions on 
the distribution of X and on the size of the discretization error. In order to 
complement our basic setup, we wih rely on the following conditions: 

(A. 2) There exists some constant k, < k < 1, such that for every 6 > 0, 
there exists a constant C3 < +00 such that 

Fi\X{t) - X{s)\ < C-ilt - s\'',t,se I)>l-6. 

(A. 3) For some constant C4 < 00 and all /c = 1, 2, . . . there is a fc-dimensional 
linear subspace Ck of L'^{[0, 1]) with 

Ef inf sup\X{t)- f{t)\A < (74/^-2". 

Before proceeding any further, let us consider assumption (A. 3) more 
closely. The following lemma provides a link between assumption (A. 3) and 
the degree of smoothness of the random functions Xj. 

Lemma 1. For some qi = 0, 1,2, . . . and < r2 < 1 assume that X is 
almost surely qi-times continuously differentiahle and that there exists some 
C5 < 00 such that 

Ef sup \X^'i'\t)-X^'i^\s)f]<C^d^'^ 

\\t~s\<d J 

holds for all d> 0. There then exists a constant Cq < 00, depending only on 
qi, such that for all k = 1,2, .. . 

Ef inf snp\X{t) - f{t)A < CC^k'^^'^^+^'^l 

where £k denotes the space of all polynomials of order k on [0, 1] . 

Proof. The well-known Jackson's inequality in approximation theory 
implies the existence of some Ce < 00, only depending on qi, such that for 
all /e = 1,2,... 

inf <C6fc-2«^ sup \x('i^\t) - X('^^\s)\' 

holds with probability 1. The lemma is an immediate consequence. □ 

The lemma implies that if assumption (A. 2) can be replaced by the 
stronger requirement E(sup|^_<,|<(^ ^ C^d''^^'^, d> 0, then as- 

sumption (A. 3) necessarily holds for some q> k. Indeed, q^ k will result 
from a very high degree of smoothness of Xi. 
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On the other hand, assumption (A. 3) only requires that the functions 
Xi be weh approximated by some arbitrary low dimensional linear function 
spaces (not necessarily polynomials). Even if Xj are not smooth, assumption 
(A. 3) may be satisfied for a large value of q (the Brownian motion provides 
an example). 

Theorem 1 together with assumptions (A. 2) and (A. 3) now allows us to 
derive rates of convergence of our estimator a. First note that assumption 
(A. 3) determines the rate of decrease of the eigenvalues \xj of ^X'^X. 

For any A;-dimensional linear space £fc C L2([0, 1]), let Vk denote the corre- 
sponding px p projection matrix projecting into the fc-dimensional subspace 
^k,p = {v £ W\v = (/(ii), . . . , f{tp)Y, f G Ck}- Basic properties of eigenval- 
ues and eigenvectors then imply that 



=fc+i 

(3.3) 



V Xxj < inf ivf (Ip - T^'fc)— X"X 



1 n p 



and assumption (A. 3) implies that for any 5 > there exists a < cxo such 
that P(Ej=fc+i Axj < Csk-^'i) >l-5. 

Assumptions (A.l) and (A. 2) obviously lead to 

(3.4) lj2{d,-df = 0p{p~^'^). 

1=1 

If n,p^oo, yO— >0, l/{np) — > 0, then relations (3.1), (3.2) and (3.3) imply 
that 

lis - a||f^,^ = Op{p + (npV(2n.+2.+l))-l 

In the following we will require that p is sufficiently large compared to n so 
that the discretization error is negligible. It therefore suffices that np~'^'^ = 
0(1) as n,p^ oo. This condition imposes a large number p of observation 
points if K is small. However, if the functions Xi are smooth enough such 
that K=l, then np~'^'^ = 0(1) is already fulfilled if ^ = 0(1) as n,p^ oo, 
which does not seem to be restrictive in view of practical applications. The 
above result then becomes 

(3.5) lia - c^Wl^^ = Op{p + (V/(2«.+2,+i))-i)_ 

Choosing /?~ 7T,-{2m+2(?+l)/{2m+2(jr+2) ^ ^.^^ COUcludc that 

(3.6) lia - a||2^_^ = o^(n-(2-+2'?+i)/{2m+2,+2))_ 
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The next theorem studies the behavior of the estimator for the empirical 
L^-norm || • ||r„. It is shown that if p is sufficiently large compared to n, 
then based on an optimal choice of p, the rate of convergence given in (3.6) 
generalizes to the semi- norm || • ||r„. 

Theorem 2. Assume (A.1)-(A.3) as well as np~'^'^ = 0{l), p^O, 1/ 
(np) ^ as n,p^ oo. Then 

(3.7) ||5 - a||2^ = Op{p + (npi/(2™+2,+i))-i). 

We finally investigate in the next theorem the behavior of ||S — a||p. The 
following assumption describes the additional conditions used to derive our 
results. It is well known that the covariance operator T is a nuclear, self- 
adjoint and nonnegative Hilbert-Schmidt operator. We will use CiX2t ■ ■ to 
denote a complete orthonormal system of eigenfunctions of T corresponding 
to the eigenvalues Ai > A2 > • • •. 

(A. 4) There exists a constant C7 < 00 such that 



(3.8) 

< JLe{{X - E{X), Crf)E{{X - E{X),Csf) 
n 

holds for all n and all r, s = 1, 2, . . . . Moreover, \\X-E{X))f = Op{n~'^). 

Relation (3.8) establishes a moment condition. It is necessarily fulfilled if 
X\, . . . ,Xn are i.i.d. Gaussian random functions. Then (X, — E{X),C,r) ~ 
A^(0,E((Xi - E(A:),Cr)^)), and {Xi - E(X),Cr> is independent of {Xi - 
E{X),C,s) if r 7^ s. Relation (3.8) then is an immediate consequence. 

However, the validity of (3.8) does not require independence of the func- 
tions Xi. For example, in the Gaussian case, (3.8) may also be verified 
if Cov((Xi - E{X)Xr){X^ - E(X),C),(X, - EiX)Xr){Xj - E(X),C.)) < 
CjEdXi - E{X),Cr)^)E{{Xi - E{X),Cs)'^) ■ for some < g < 1, C7 < 00 
and j- This is of importance in our application to ozone pollution fore- 
casting which deals with a time series of functions Xi, . . . 

Theorem 3. Under the conditions of Theorem 2 together with assump- 
tion (A. 4) we have 

(3.9) ||a - a\\l = Op(p + (7ipV(2™+2,+i))-i ^ ^-{2,+i)/2)^ 

Furthermore, (1.5) holds for any random function X^j^i possessing the same 
distribution as X and independent of Xi, . . . , X„. 

Theorem 3 shows that if 2g > 1 and p ~ „-(2™+2g+i)/{2m,+2g+2) ^ ^j^^^ 
prediction error can be bounded by 

E((S5 + (q,X„+i) - ao - (a,X„+i))2|a5,S) = 0^(n-(2-+2<?+l)/(2m+2g+2))^ 
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3.2. Optimality of the rates of convergence. For simplicity we will rely 
on the special case of (1.2) with oq = 0. In this case K{{{a, Xn+i) — oq — 
(S, Xn_|_i))^|ao, 3) > ||S — a||p if X possesses a centered distribution with 
E(X) = 0. In Proposition 1 below we then show that for suitable Sobolev 
spaces of functions a and a large class of possible distributions of Xi , the rate 
j^-(2m+2g+i)/(2m+2(j+2) jg lower bound for the rate of convergence of the 
prediction error over all estimators of a to be computed from corresponding 
observations {Xi,Yi), i = l,...,n. Consequently, the rate attained by our 
smoothing spline estimator a must be interpreted as a minimax rate over 
these classes. 

We first have to introduce some additional notation. For simplicity, we 
will assume that the functions Xi{t) are known for all t so that the num- 
ber p of observation points may be chosen arbitrarily large. We will use 
Cm,D to denote the space of all m-times continuously differentiable func- 
tions a with /q a^^^ (t)^ dt < D for all j = 0, 1, . . . , m. Furthermore, let Vq,c 
denote the space of all centered probability distributions on L^([0,1]) with 
the properties that (a) the sequence of eigenvalues of the corresponding 
covariance operator satisfies J2'jLk+i — Ck~'^'^ for all sufficiently large 
k, and that (b) the smoothing spline estimator a satisfies ||S — a||p = 

Op(n-(2m+2g+l)/(2m+2g+2)) fo^ ^ ^ ^^^^ ^ _ ^-(2m,+29+l)/(2m+2g+2) 

ever p is chosen sufficiently large compared to n). Finally, for given a G Cm,D, 
probability distribution P £ Vg^c and i.i.d. random functions Xi,...,Xn, 
Xi ~ P, let a(a, P) denote an arbitrary estimator of a based on correspond- 
ing data {Xi,Yi), z = 1, . . . ,n, generated by (1.2) (with oq = 0). 

Proposition 1. Let Cn denote an arbitrary sequence of positive numbers 

with Cn ^0 as oo, and let 2g = 1, 3, 5, Under the above assumptions, 

we have 

lim sup sup inf P(||a - a(a, P)||^ > c„ • n-(2"^+2''+i)/(2™+29+2)) = 1_ 

It is of interest to compare our results with those of Cai and Hall [4] who 
analyze the error {a — a,x)'^ for a fixed curve x. Similarly to our results, 
the rate of decrease of the eigenvalues A,- of F plays an important role. 
Note that, as shown in the proof of Theorem 3, assumption (A. 3) yields 
Yl,'^k+i^r = 0{k~'^'^). Since Ai > A2 > • • • this in turn implies that A^ = 
0{r~'^'^~^), and one may reasonably assume that S-V-^"?-! < A^ < Sr-^"?-! 
for some < i3 < 00. However, Cai and Hall [4] measure "smoothness" of a 
in terms of a spectral decomposition a(t) = J2r (^rCr{t) and not with respect 
to usual smoothness classes. Their quantity of interest is the rate /3 > 1 of 
decrease \ar\ = 0{r~^) as r ^ 00. But recall that the error in expanding 
an m-times continuously differentiable function with respect to k suitable 
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basis functions (as, e.g., orthogonal polynomials or Fourier functions) is of 
an order of at most A;~^™. For the sake of comparison, assume that Ci) C21 • • • 
define an appropriate basis for approximating smooth functions and that 
inf/6span{Ci,...,Cfc} l|a - /IP = E^fc+iar = 0(A;"^™). This will require that 
a2 = 0(r"2m,^i) and, hence, 2/3 = 2m + 1. 

Results as derived by Cai and Hall [4] additionally depend on the spectral 
decomposition x{t) =J2r^rCrit) of a function x of interest. The essential 
condition on the structure of the coefficients Xr may be re-expressed in the 
following form: There exist some G M and < Dq < 00 such that D^^r'^ < 

Y- < Dqv'^ for all r = 1,2, Rates of convergence then follow from the 

magnitude of v, and it is shown that parametric rates (or n""^logn) are 
achieved if < — 1. 

Now consider a random function Xn+i and assume that the underlying 
distribution is Gaussian. It is then well known that Xn+i{t) = J2r Xn+i,rCr{t) 

for independent A^(0, Ar.)-distributed coefficients Xn+i^r- Consequently, 

are i.i.d. Xi-distributed variables for all r = 1,2, . . . , and if u <0 we obtain 

FiD^^r" < < Dor" for ah r = 1, 2, . . .) = for all < L>o < 00. This 

already shows that parametric rates cannot be achieved for the error 
(a — a,Xn+i)'^. On the other hand, for arbitrary u > and < 6 < 1 we 

have FiDQ^r" < < Dor" for all r = 1, 2, . . .) > <5, whenever Do is suffi- 

ciently large. If B~^r~'^'^~^ < K < Br~'^'^~^ and = Op(r~^'"+^), then for 

a function x with D^ r" < < Dor" , u > 0, the convergence rates of 

Cai and Hall [4] translate into 

(S - a,x)2 = Op(^-(2m+2,+l-2.)/(2™+2g+2))^ 

which provides an additional motivation for the fact that the rates derived in 
our paper constitute a lower bound. For non-Gaussian distributions a com- 
parison is more difficult, since under assumption (A. 4) only the Chebyshev 

inequality may be used to bound the probabilities r" < -^y^ < Dor". 

Another statistically very different problem consists in an optimal estima- 
tion of a by S with respect to the usual L^-norm. In a recent work. Hall and 
Horowitz [18] derive optimal rates of convergence of ||S — These rates 
again depend on the rate of decrease \ar\ = 0{r~^). Recall that our assump- 
tions do not provide any link between a and Xi; part of the structure of a 
may not even be identifiable. Indeed, under assumptions (A.1)-(A.4) there is 
no way to guarantee that the bias ||a — E£(a)|p converges to zero and it can 
only be shown that ||S — a|p = Op(l) (see the proof of Theorem 2 below). 
This already highlights the theoretical difference between optimal estimation 
with respect to ||S — a||p and ||S — a|p. Based on additional assumptions 
as indicated above, although sensible bounds for the bias may be derived. 
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it must be emphasized that an estimator minimizing ||S — a|p will have to 
rely on p3> 77,-(2"^+2g+i)/(2m+2ij+2) ^ which corresponds to an oversmoothing 
with respect to ||a — a||p. This effect has already been noted by Cai and 
Hall [4]. In our context, without additional assumptions linking the eigen- 
values of r and of the spline matrix A^, the only general bound for the L2- 
variability of the estimator is ||a — E£(S)|p = Op(^) (this result may be de- 
rived by arguments similar to those used in the proofs of our theorems). With 

^ = „-{2rn,+2g+l)/{2m,+2g+2) ^^^^ ^g^^g || S - (3) f = Op (n"V(2m+2<?+2) ) ^ 

and better rates may only be achieved with p > n~(^'"+^'?+-^)/(^'"+^'?+^\ A 
more detailed study of this problem is not in the scope of the present paper. 

3.3. Choice of smoothing parameters. The above result of Section 3.1 
implies that the choice of the smoothing parameter p is of crucial importance. 
A natural way to determine p is to minimize a leave-one-out cross-validation 
criterion. We preferably adapt the simplified Generalized Cross- Validation 
(GCV) introduced by Wahba [31] in the context of smoothing splines. For 
fixed m, in our application the GCV criterion takes the form 

(l/n)||Y-HpYf 



(3.10) GCVmip) 



^l-n-iTV(Hp))2 ' 



where Up := {npr^X{^X-X + ^Am)-^X- . 

Proposition 2 below provides a justification for the use of the GCV cri- 
terion. Recall that the estimators S = ap-m depend on p as well as on the 
spline order m. Obviously, ^'Kotp-m = HpY is an estimator of the conditional 
mean ((Xi — X,a), .. ., (A„ — X,a)y of Y given Xi, . . . , A„. Let 



{Xi-X,a) --J2iMtj)-X{tj))ap.m{tj) 



denote the average squared error of this estimator. The only difference be- 
tween ASEm{p) and ||Sp — Q;||p^^ is the discretization error encountered 

when approximating {Xi,a) by - X]j ^^'^ hence ASEm{p) = 

ll«p-«llr„,p + Op(p-'")- 

If p denotes the minimizer of GCV for fixed m, we can conclude from 

relation (3.11) of Proposition 2 that the error ASEm{p) is asymptotically 
first-order equivalent to the error ASEmiPopt) to be obtained from an op- 
timal choice of the smoothing parameter. Furthermore, (3.12) shows that 
an analogous result holds if GCV is additionally used to select the order m 
of the smoothing spline, which means that the optimal rate can be reached 
adaptively. 

Proposition 2. In addition to assumptions (A.1)-(A.3) as well as 
np~'^'^ = 0(1), suppose that E(exp(/?e?)) < oo for some /3 > 0. If for fixed m, 
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p denotes the minimizer of GCV{p) over p £ [n some 5 > 0, 

then 

(3.11) \ASEm{p) - ASEm{popt)\ = Op{n-^/^ASEm{PoptY'^). 

where popt minimizes MSEm{p) '■= ^e{ASEm{p)) over all p> 0. 

Furthermore, ifm,p denotes the minimizers of (3.10) over p£ oo), 
5 > 0, and m = 1, . . . , M„, M„ < n/2, then 

(3.12) \ASE^{p) - ASEm^^^,{pop,)\ = Op{n~^!^ ASE^^^,{pop,f/HogMn), 

where Poptj^iopt minimize MSEm{p) '■=^e{ASEm{p)) over all p > and 
m = l,.. .,Mn. 

4. Case of a noisy covariate. In a number of important applications mea- 
surements of the explanatory curves Xi may be contaminated by noise. 
There then additionally exists an errors-in-variable problem complicating 
further analysis. Our setup is inspired by other works dealing with noisy 
observations of functional data (e.g., Cardot [3] or Chiou, Miiller and Wang 
[9] ) : At each point tj the corresponding functional value Xi [tj ) is corrupted 
by some random error 6ij so that actual observations Wi{tj) are given by 

(4.1) Wi{tj)=Xi{tj) + 6ij, i = l,...,n,j = l,...,p, 

where (5jj)i=i,...,n,j=i,...,p is a sequence of independent real random variables 
such that for all i = 1, . . . ,n and all j = 1, . . . ,p 

(4.2) E,(5,,) = 0, E.(4)=^' and E,{5f^) < Cs 

for some constant Cs> (independent of n and p). We furthermore assume 
that 6ij is independent of Si and of the Xj's. 

In this situation, an analogue of our estimator a of Section 2 can still 
be computed by replacing in (2.6) the (unknown) matrix X by the n x p 
matrix W with general terms Wi{tj) — W, i = 1, . . . ,n, j = 1, . . . ,p. How- 
ever, performance of the resulting estimator will suffer from the additional 
noise in the observations. If the error variance cj| is large, there may exist a 
substantial difference between X'^X and W^W. Indeed, W^W is a biased 
estimator of X'^X: 

(4.3) ^ W"W = ^X"X + 4lp + R, 

^pZ ^pZ pZ 

where R is a p x p matrix such that its largest singular value is of order 
Op{ )) (see the proof of Theorem 4 below). This result suggests that 

2 

we use W^W rlr, as an approximation of ^-^X'^X. A prerequisite is, 
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of course, the availability of an estimator ctI of the unknown variance (t|. 
Following Gasser, Sroka and Jennen-Steinmetz [16], we will rely on 



1 " 1 



(4.4) :=-J2 w— ^ E[^^(*J-i) " ^^(*^) + W^itj+l) - W.itj)]^ 

1=1 ' j=2 

These arguments now lead to the following modified estimator Sw of cx in 
the case of noisy observations: 



(4.5) aw:= — f^W"W + ^A„-40 'w^Y. 

np \np^ p p'^ I 



An estimator of the function a is given by qw = ^'Sc^^ ' where is again 
the natural spline interpolant of order 2m as defined in Section 2. 

We want to note that fiw is closely related to an estimator proposed by 
Cardot et al. [5]. The latter is motivated by the Total Least Squares (TLS) 
method (see, e.g., Golub and Van Loan [17], Fuller [15], or Van Huffel and 
Vandewalle [29]) and the only difference from (4.5) consists in the use of a 

correction term slightly different from — p-Ip. 

Of course there are many alternative strategies for dealing with the errors- 
in- variable problem induced by (4.1). A straightforward approach, which 
is frequently used in functional data analysis, is to apply nonparametric 
smoothing procedures in order to obtain estimates Xi{tj) from the data 
iWi{tj),tj). When replacing X by X in (2.6), one can then define a "smoothed" 
estimator 0.3. Of course this estimator may be as efficient as (4.5), but it is 
computationally more involved and appropriate smoothing parameters have 
to be selected for nonparametric estimation of each curve Aj. 

Our aim is now to study the asymptotic behavior of Sw- Theorem 4 
below provides bounds (with respect to the semi-norm F^^ p) for the differ- 
ence between Sw and the "ideal" estimator a defined for the true curves 
Al, . . . , An,. We will impose the following additional condition on the func- 
tion a: 

(A. 5) For every 5 > there exists a constant Ca < 00 such that 

1 



pl/2 



— X^Xa 

np 



holds with probability larger or equal to 1 — 6. 

Theorem 4. Assume (A.l), (A. 2), (A. 5) as well as np~'^'^ = 0{l), p- 
0, 1 /{np) — > as n,p ^ 00. Then 

2 „ / 1 1 



(4.6) ||Sw-a||^_= Op + 

\npp n 
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Together with assumption (A. 3) we can therefore conclude from Theorems 
1 and 4 that 

1 



2 nir^nl/(2'n+2g+l)x-l 



Sw - cWt^^ = Op(p+ (np'f(-^+^i+^)y^ + 



npp^ 

We have akeady seen in Section 3 that the optimal order of the two first 
terms is reached for a choice of /9~ 7),-(2»n'+2q+i)/{2m+2(j+2) _ From an asymp- 
totic point of view, the use of Sw results in the addition of the extra term 
l/{npp) in the rate of convergence. For p ^ n~*^^™'+^'^+-^)/^^™+^'?+^) we have 

l/{npp) ~ n-V(2m.+25+2) rpj^-g ^g^^ Q^^gj. ^-(2m+25+l)/(2m+2q+2) f^^. 

p fi{'^'i^+'^i~'^)/{'^'i^+'^i+'^) _ This means that the Sw reaches the same rate 
of convergence as a provided that p is sufficiently large compared to n. 
More precisely, it is required that p> Cpmax(n^/^'',n(^'"+^'^~^)/(^'"+^'^+^)) 
for some positive constant Cp. 

As shown in Theorem 5 below, these qualitative results generalize when 
considering the semi- norms r„ or F. 

Theorem 5. Assume (A.1)-(A.3), (A. 5) as well as np~'^'' = 0{1), p^ 
0, 1 / (np) ^ as n,p ^ oo. Then 

2 ^ / 1 1 



(4.7) IISw - S||^^ = Op + 

V npp n 

and if assumption (A. 4) is additionally satisfied, 

(4.8) ||Sw - S||^ = Op(— + - + n-(2^+i)/2 

\npp n 

5. Application to ozone pollution forecasting. In this section, our method- 
ology is applied to the problem of predicting the level of ozone pollution. For 
our analysis, we use a data set collected by ORAMIP (Observatoire Rgional 
de I'Air en Midi-Pyrnes), an air observatory located in the city of Toulouse 
(Prance). The concentration of specific pollutants as well as meteorological 
variables are measured each hour. Some previous studies using the same 
data are described in Cardot, Crambes and Sarda [6] and Aneiros-Perez et 
al. [1]. 

The response variable Yi of interest is the maximum of ozone for a day. 
Repeated measurements of ozone concentration obtained for the preceding 
day are used as a functional explicative variable Xi . More precisely, each Xi 
is observed at p = 24 equidistant points corresponding to hourly measure- 
ments. The sample size is n = 474. It is assumed that the relation between 
Yi and Xi can be modeled by the functional linear regression model (1.2). 
We note at this point that Xi,X2, ■ ■ ■ constitute a time series of functions, 
and that it is therefore reasonable to suppose some correlation between the 
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Xj's. The results of an earher, unpublished study indicate that there only 
exists some "short memory" dependence. 

Now, for a curve X^+i outside the sample, we want to predict l^+i, the 
maximum of ozone the day after. Assuming that (X^+i, Y^+i) follows the 
same model (1.2) and using our estimators S of a and oq of oq described 
in Section 2, a predictor Yn+i is given by the formula 

(5.1) Yn+i:=aa + ja{t)Xn+i{t)dt. 

It cannot be excluded that actual observations of Xj may be contaminated 
with noise. We will thus additionally consider the modified estimator Sw 
developed in Section 4 and the corresponding predictor yw,n+i- For sim- 
plicity, the integral in (5.1) is approximated by ^YTj=iOi{tj)Xn+i{tj)- With 
additional assumptions on the e^'s we can also build asymptotic intervals of 
prediction for l^+i. Indeed, let us assume that ei, . . . ,£^+1 are i.i.d. random 
variables having a normal distribution AA(0, cr^). The first point is to esti- 
mate the residual variance a^. A straightforward estimator is given by the 
empirical variance 

(5.2) ae^:=lj2(Y,-Y--j2a{t,){X,{t^)-X{t,)) \ . 

^ i=l\ P 3=1 ) 

Our theoretical results imply that is a consistent estimator of a^. Fur- 
thermore, we can then infer from Theorem 3 that asymptoti- 
cally follows a standard normal distribution. Given r g]0, 1[, an asymptotic 
(1 — r)-prediction interval for 1"^+! can be derived as 

(5.3) - Zx-rl2^e. + Zx-rl2^e\ , 

where -Zi_t-/2 is the quantile of order 1 — r/2 of the AA(0, 1) distribution. Of 

course, the same developments are valid when one replaces Yn+i by Yw.n+i- 
In order to study performance of our estimators we split the initial sample 
into two sub-samples: 

• A learning sample, (Xj, yi)j=i^...^jij , ni = 300, was used to determine the 
estimators a and Sw- 

• A test sample, {Xi,Yi)i=ni+i,...,ni+nti iT't = 174, was used to evaluate the 
quality of the estimation. 

Construction of estimators was based on m = 2 (cubic smoothing splines) , 
and the smoothing parameters p were selected by minimizing GCV{p) as 
defined in (3.10). Note that GOV for Sw requires that the rnatrix ^-^X'^X 
in the definition of Hp has to be replaced by ^^W^W — p-Ip. Figure 1 
presents the daily predicted values Y and Iw of the maximum of ozone 



20 



C. CRAMBES, A. KNEIP AND P. SARDA 



versus the measured Y-values of the test sample. Both graphics are close, 
which is confirmed by the computation of the prediction error given by 

EQM{a):=- {Yi-Y^f, 

with a similar definition for Sw- We have, respectively, EQM(a) = 281.97 
and EQM{aw) = 270.13, which shows a very minor advantage of the esti- 
mator aw- In any case, in Figure 1 the points seem to be reasonably spread 
around the diagonal Y = Y, and the plots do not indicate any major problem 
with our estimators. Corresponding prediction intervals are given in Figure 
2. 

6. Proof of the results. 

6.1. Proof of Theorem 1. First consider relation (3.1), and note that 

E,(a) = ^f^X"X + ^A„,') 'x^Xa + — f^X^X + ^A^) 'xM, 

np'^ V np'^ P / np\ np'^ p J 

where d = (di — d, . . . , (i„ — dY . 




Fig. 1. Daily predicted values Y (left) and Vw (right) of the maximum of ozone versus 
the measured values. 
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It follows that K^{a) is a solution of the minimization problem 

2 



1 

mm < — 

aGKP I n 



-Xa + d- -Xa 

P P 



P 



This implies 
1 



-Xa + d- -XE^fa^ 



+ ^E,(arA^E,(a)<^a^A„a + i||df. 
p p n 



But definition of A^, and (2.3) lead to 

P P Jo P Jo 



dt 



and (3.1) is an immediate consequence. Let us now consider relation (3.2). 
There exists a complete orthonormal system of eigenvectors ui,U2, ■ ■ ■ ,Up of 
^X-X such that ;i;X-X = E^=i K,jUjU^. Let k := [p-i/(2-+2g+i)]. gy our 
assumptions we obtain 



E,(||a-E,(a)||^„j 



-E, 



p \ n^p 



e^X 



^X-X + ^A„ 

np"^ p 



-1 
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(6.1) 



where 



and 



1 



<^TV 

n 

n 



np 
1 



1 



P 



X — X"X ^X"X + ^Ara X"£ 



p 



-1 



— X"X + pA„ — X"X 



np 



1 



np 



\np 



1 



x(/,A„)-i/2 — X-X (/>A„)-V2 



<^Tr(Di,p + D2, 



n 



D 



2,P 



-1/2 I 



^j=k+l ) 



\i=A:+l / 

which are symmetric p x p matrices with 



(6.2) 



sup v'^Di^pV < 1 and sup v'^D2,pV < 1. 

||vj| = l ' l|v||=l 



Furthermore, Di^p is of rank k and therefore only possesses k nonzero eigen- 
values. Hence 

(6.3) Tr(Di,p) < k. 

Let ai p, . . . , 8irn,p: ^m+i,pi ■ ■ ■ : ^p,p denote a complete, orthonormal system of 
eigenvectors of A^, corresponding to the eigenvalues ^i,p = • • • = Pm,p = 1 
and firn+i,p < ••• < Pp,p- By (6.1), (6.2) and (6.3) as well as (2.7), we thus 
obtain 

E,(||a-E,(a)||^_) 



V j=i 
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<^ik + m + k+ J2 ^IpiP-^r 

^ \ l=m+k+l 



-1/2 



(6.4) 



\j=k+l ) / 



n 



< ^(m + 2k + CkCo) 
n 



n 



{m + [p 



-l/{2m+2g+l) 



])(2 + CCo). 



This proves Relation (3.2) and completes the proof of Theorem 1. 



6.2. Proof of Theorem 2. With di = Jj a{t)Xi{t) dt - j Yl'j=i a{tj)Xi{tj) 



we have 



\a — a\ 



< 



n 



E 

i=l 



{iXi-X),a-a) 



1 2 



-Y.(Mtj)-X{tj)){a{t,)-a{t,)) 



P 



(6.5) 



Q n 

+ -E 

1=1 



P 



Y,iX,-X)it,)iaitj)-a{tj)) 



4 " _ - 4 " 
< - -d? + - Y.(d^ - d? + 2||S - a||2^ ^. 

1=1 1=1 

By assumptions (A.1)-(A.3), it follows from Theorem 1, (3.3) and (3.4) that 
the assertion of Theorem 2 holds, provided that 



(6.6) 



1 



Y^{d,-df = opip- 



■2k\ 



1=1 



The proof of (6.6) consists of several steps. We will start by giving a stochas- 
tic bound for and then study the stochastic behavior of /q S^*") (t)^ dt. 
The use of a suitable Taylor expansion will then lead to the desired result. 
By definition of a we have 

-2 



1 

P 



1 



S^S < -ct^— X^X — X^X + pA. 



p np 



1 



np 



1 

np 



X^Xq 
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(6.7) 



1 



np 
1 



-2 



+ 3^£^X — X^X + pA 



np 



X^d 



X"e. 



Since all eigenvalues of the matrix — X'^Xf— X'^X + pAm) ^— X'^X are 
less than or equal to 1, the first term on the right-hand side of (6.7) is less 
than or equal to ^ol^ol = 0(1). It is easily seen that the smallest eigenvalue 

of the matrix ^X(^X'^X + /9Am)~^X'^ is proportional to and thus 

the second term can be bounded by a term of order p~'^'^ / p. By (2.7) the 
expected value of the third term is bounded by 



0"; 



n 



Tr 



1 

np 



X( — X^X + pA, 



np 



<^Tr[(pA, 
n 



:0(l/(np)). 



We therefore arrive at 
(6.8) 



-a^a = Op ( 1 + 
p 



-2k 



+ 

np 



As a next step we will study the asymptotic behavior of 3*^™) (t)^ dt. Since 
a is solution of the minimization problem (2.5), we can write 



Y- -xa 

p 

1 



< 



n 



p Jo 

p Jo 



and therefore 

nl 

P ' 

(6.9) 



a^'^\tfdt< \\di-cx\\l +-/Y--Xa,-Xa- -Xa 

n\ p p p 



+ p / a^'^\tYdt 







We have to focus on the term 

-/y - -Xa, -xa - -Xa 

n \ p p p 



2 / 1 1 
-(d + e,-Xa Xa 

n \ p p 

The Cauchy-Schwarz inequality together with the definition of || 



yield 



(6.10) 

Note that 
2 



1 



n 



1 



-d" -xa - -Xa =Op(p~'^||a-a||r„,,) 



£, -Xa 

n \ p 



n 



-Xa 



1 



xEe(a) - -Xa + -xa 



2 



n 



1. 



P 



-xE,(a; 
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Obviously, ^s'^ {-^K^{a) — -'Kcx) is a zero mean random variable with vari- 

2 

ance bounded by ^||E£(S) — ctUp^ . By definition of S, (3.3), (6.1) and (6.4) 
we have 



1 ./I 



n \p 



Er -s^ -XS - -XE^(S) < ^ Tr 



P 



n 



Op 



1 



— X"X + pAm — X"X 



np 



1 



np 



1 



np 



l/(2m+2g+l) 



We can conclude that 
2 / 1 ^ 1 



1 



When combining (6.8), (6.9), (6.10) and (6.11) with the results of Theorem 
1 we thus obtain 



(6.12) 



a' 



p ^p(2m+2g+2)/(2m+2g+l) 

Let us now expand a into a Taylor series: a{t) = P(t) + -R(t) for all t G [0, 1] 
with 



and 



1=0 ^- ^0 
(t - u)"^-! 



r(t) 



-S(™)(n) du. 



/o (m-1)! 

It follows from (6.8) as well as (6.12) that |S(')(0)| = Op(l + (^)^/^ + 
( ^p(2m+2g+2V(2m+29+i) )^^^) for / = , . . . , 771 - 1 , aud somc straightforward cal- 
culations yield 



l|-||2 






Q a a 




L 


P 







(\p{t) + R{t)f dt-- J2{p{t,) + R{t,)y 



\i=i 



{P{t) + R{t) + P(tj) + R{tj)f dt 



P r .tj+l/(2p) 
t,-l/{2p) 

tj+l/(2p) 



n 2" 



1/2 



t,-l/(2p) 



|P'(s)| + \r{s)\ds 



1/2 



which leads to 



|a|| a OL 

P 
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(6.13) 

= Op [p-^ . (^1 + ^ + [„^(2m+2g+2)/{2™+2g+l)pl 

Using again (6.8) and our assumptions on p,p,n, this implies 
(6.14) \\af = Op{l). 

At the same time, (6.8) and (6.12) together with assumptions (A.l) and 
(A. 2) imply that with Xi=Xi-X 

1 " ^ - 1 " / ^ ftj+'^/i'^p) 

+ a{tj){Xi{t)-Xiitj))dt] 



<2a:Lx E- / \P'{t)\ + \rit)\dt 



(t) -X(tj))^dt 



and thus 



l.^(d,-d)2 = OpL-2^1 + ^ + 



' \ \ p no(2™+29+2)/{2m+2g+l) 

(6.15) 



p np 



By our assumptions on p,p,n, relation (6.6) is an immediate consequence. 
This completes the proof of Theorem 2. 

6.3. Proof of Theorem 3. In terms of eigenvalues and eigenfunctions of 
r we obviously obtain 

r 

Let Tri = {Xi — E(X), (^,.) for r = 1, 2, . . . and z = 1, . . . , n. Some well-known 
results of stochastic process theory now can be summarized as follows: 

(i) E(Tri) = 0, IE(rj^J = Ar, and E(TriTsj) = for all r, s, s / r and i = 
l,...,n. 

(ii) For any /c = 1, 2, . . . , the eigenfunctions Cij • • • > Cfc corresponding to 
Ai > • • • > Afc provide a best basis for approximating Xi by a fe-dimensional 
linear space: 
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(6.16) 



r=q+l 



X-EiX)-Y,{X-EiX),CsKs 



<E(^m^fJ|X-E(X)-/f), 



for any other fc-dimensional linear subspace Ck of L^([0, 1]). 
By (A. 3) we can conclude that 



(6.17) 



At first we have 



r=k+l 



as k ^ oo. 



r) n r) n 

lis - a||^„ < - E(a -a,Xi- E{X)f + - ^(5 - a,E(X) - Xf, 
1=1 1=1 

and by (6.14) and with assumption (A. 4) the last term is of order Op{n~^] 
The relevant semi-norms can now be rewritten in the form 



l"-"llr = J2^'^(^^- 

r=l 



a — a) 



r=l 



\a — a\ 



(6.18) 

and 

(6.19) 

+ Op(n-i), 

where I{r = s) = 1 if r = s, and /(r = s) =0 if r ^ s. Define 

^i'^ri - Ar) and frs = , '^riTsi, r^S 



OO OO /I 

l« - "llr + H X] "^"^ ~ X! '^"'^•'^ ~ Ar-/'(?^ = s) 

r=ls=l \ 1=1 / 



XrVn 



i=l 



(with TVs := if min{Ar, A^} = 0). The properties of Tri given in (i) imply 
that E(tVs) = for all r, s, and we can infer from assumption (A. 4) that for 
some Cio < oo 

(6.20) E{fl) < Cio, 

holds for all r,s = 1,2,... and all sufficiently large n. Using the Cauchy- 
Schwarz inequality we therefore obtain for all k = 0,1, . . . 

1 

n 



r=l s=l 



E E Uras - Y TriTsi - Xrl{r = s) 
X"- i=l 

-j^ OO OO 

—^YYl 0!ras{XrXs)^^'^f, 
y^r=ls=l 
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(6.21) 

2 / k oo \l/2/fcoo \1 



*s I rs 

\r=ls=r / \r=ls=r / 

r) / oo oo \ 1/2 / oo oo \ 1/2 

^ ' ~2 ~2 1 / \ \ ~2 1 



r=fc+ls=r / \r=k+ls=r 



sTrs 



Relation (6.14) leads to ||S — a|p > J2'^i = Op{l), which together with 
(6.18) implies that for arbitrary k 

(k oo \ 1/2 / / oo \ / °° W^^"^ 

EEA.«?«'j <((EA^5,^j =Op(l|S-a||r). 

Choose k proportional to n^/^. Relation (6.17) then yields J2'i^k+i J2^r K^s < 
{EZk+iKf = 0{7i-^'^) and Er=iE^r A, = 0(max{logn,?i(i-2'?)/2}). Since 
by (6.20) the moments of uniformly bounded for all r,s, it follows 

that 

(fc oo \ 1/2 

$:$:A.r2j =Op(max{logn,n(i-2'')/4}), 
r=ls=r j 

(oo oo \ 1/2 

r=k+\ s=r J 

When combining these results we can conclude that 

oo oo / 1 ^ 

Y "rOs [ - E ^"'^si ~ -^rlir = s) 
r=l s=l \ 1=1 J 

= Op(max{n-i/2 logn • ||a - a||r, n-(29+i)/4 . ||S _ a||r, n-(29+i)/2}). 

Together with (6.19) assertion (3.9) now follows from the rates of conver- 
gence of IIS — a||r derived in Theorem 2. 

It remains to prove (1.5). Note that by our assumptions on and as- 
sumption (A.4) we have |E(y) - < 26^ + 2{a,E{X) - X)^ = Op(n-^). 
Together with (6.14) and assumption (A.4) this implies 

|E((a5 + {a,Xn+i) - ao - (a, X„+i))^|a5, a) - ||S - a||r| 
< 2|E(y) -Fp + 2{a,K{X) -Xf = Op(n~^), 

which completes the proof of the theorem. 
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6.4. Proof of Proposition 1. In dependence of q we first construct special 
probability distributions of X-i. For 2g = 1, r G [0, 1] and r : = set XT-fi{t) : = 
1 for t G [0,r] and X^,o(i) := for t G (r, 1]. For 2g > 3, r G [0, 1], and r : = 
q - 0.5 let XrAt) ■= j^t' for t G [0,r] and := E -=0 If=DT^'"'(* " 

for tG (r, 1]. 

For k = l,2, . . . let >C(r+i)fc denote the {r + 1) • k dimensional linear space 
of all functions of the form gp{t) := E?=d(ELo Aj^O • Ht G H,^])- 
It is then easily verified that sup^g^/^ q,,.]^)/;.] min^ — Xr-r{t)\ = if 

^ li' ■^]' while supig[j/fc^(j+i)/fc] min^ \gp{t) - Xr;r{t)\ < A;"'" if r G [i^^]- 
It follows that there exist constants Br <1 such that the functions BrXr-rii) 
satisfy inf^.e^^^^,,, J^{BrX^.r{t) -gp{t)f dt < C(r + 2)-(2-+i)fc-(2-+i) =C{r + 
2)-2<?^-2g all A: = 1,2,.... 

Now let Ti, . . . , T„ denote i.i.d. real random variables which are uniformly 
distributed on [0,1] and let Xt^-^ = BrXr-.rit) — ^(^^^^-.^^(t)). Obviously, 

Tj — > Xri^r{t) is a continuous mapping from [0, 1] on L'^{[0, 1]), and the prob- 
ability distribution of Tj induces a corresponding centered probability distri- 
bution on L2([0,1]). Since the eigenfunctions of the corresponding covari- 
ance operator provide a best basis for approximating Xi by a A;-dimensional 
linear space, we obtain from what is done above 

£ A,<e( inf \\Xr,,r-g}f)<Ck-''', 

j = k + l ^f/3^-^{r + l)[fc/{r + l)] / 

for all sufficiently large k and C*^j.+i)k •= {9l3 - E(5r-'^ri;r) 15/3 G ^{r+i)k}- 

In order to verify that P^. G Vq^c^ it remains to check the behavior of 
||q — Q||r = /Q^(XT-.r,S — a)'^ dr. First note that although assumption (A. 2) 
does not hold for 2q = 1, even in this case, with k = 1/2, relation (3.4) holds 
and arguments in the proof of Theorems 1 and 2 imply that for sufficiently 
large p, ^ Er=i(^r,;r, S - q)2 = Op(n-(2'»+2'/+i)/(2m+2g+2))^ ^OT some 1 > 

S > 2m+2g+2 define a partition of [0, 1] into disjoint intervals /i, . . . 
of equal length n~^. For j = l,...,n^, let Sj denote the midpoint of the 
interval Ij , and use nj denote the (random) number of ri Tn falling into 
Ij . By using the Cauchy-Schwarz inequality as well as a definition of X^-r 
it is easily verified that there exists a constant L,. < oo such that | {X^-r , 3 — 
a) - {Xr-'-r,a-a)\ < Lr\T - T*\'^/'^\\a - a\\ for r,r* G [0,1] (|r-r*|i/2 j^^y 
be replaced by |r — r*| if 2g > 1). Then 

\{Xr-^r,ci — a)^ — {Xr*-r,ci — Ct)^! 

< 2Lr\T — T*\^^'^\\a — a\\ min{ | (Xt-;^ , S — | {Xr'-r, a — a)\} 

I r2i *|||-^ ||2 

+ Lj.\t — T Ilia — a|| . 
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By (6.14) another application of the Cauchy-Schwarz inequahty leads to 

1 



Since supj^;^^ „5 ^.^"•'^^ = Op(l) with E(nj) = n • n"^ , we can conclude 

that i YJjll IE(nj)(X,^.;r, S - q)2 = Op(^-{2m,+2g+l)/(2m+2g+2))_ Finally, 



„1 -j^ 

/ (X^;r,S-a)2dr- - VE(nj)(X,,;^,Q-a)' 
< — ^sup|(X^;r,S-a)^ - {Xs^-r,a- a)'^\ 

= op(n-(2™+2g+l)/(2m+2g+2)^^ 

and the desired result ||S — a||r = Op(n~(^"^+^'?+-^)/(^'"+^''+^)) is an imme- 
diate consequence. Therefore, £Vq^c- 

We now have to consider the functionals {Xr-^ryCt) niore closely. Let 
C* {m + r + 1, D) denote the space of all m + r + 1-times continuously differen- 
tiable functions a satisfying /q a{t) dt = as well as /q a^^^t)"^ dt< D for all 
j = 0, 1, . . . ,m + r + 1 as well as a(^)(0) = a^i\l) = for all j = 0, . . . , r + 1, 
and set C*{m,r,D) = {a\a = a^^^^\a € C*{m + r + 1,D)}. Then, for any 
a G C* {m,0, D) there is a a G C*{m + 1,D) such that 

{Xr,,o, a) = Bo r ait) dt - (E(SoX,^;o), a) 
Jo 

= BQa{Ti) — Bq i a{t) dt = BQa{Ti) 
Jo 

while for any a E C*{m,r,D), r > 1 and a G C*{m + r + 1,D), a = a^^~^^\ 
partial integration leads to 

(X.,;„a) = (-l)^-i(X([;;i),a(2)) 

= {x^:-'\n)a^'\n) - x(r;i)(o)5(2)(o)) 

+ Bri-iy £' 5^^) (t) dt - Bri-iy'E 5(^) {t) dt^ 

+ (x(:;;i)(l)5(2)(i)_4-i)(^^)5(2)(^^)) 

= Br{-lYa{Ti) - nBr{-lYa{n)) = Bri-iyain). 

Obviously, a* = 1)^5 € C*{m + r + l,BrD). By construction, with 
faiTi) := {Xri,r,a) wc generally obtain 

\\a-d{a,Pp)\\l = {fair) - fa{a,Pr){T)f dr. 
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By definition, fa{T) = 5*(t) = E(li|rj = r) is the regression function in the 
regression model Yi = 5*(rj) and we will use the notation Sn{a*) to de- 
note an estimator of a* from the data {Yi, Ti), . . . , (Yn, r„). Note that knowl- 
edge of {Yi,Ti) is equivalent to knowledge of {Yi,XT--r), and an estimator 
fa{a,Pr) can thus be seen as a particular estimator Sn{oi*) based on 

(Yi,Ti), . . . , {Yn,Tn). We Can conclude that as oo, 

sup sup inf P(||a — a(a, P)||p 

> Cn ■ n-(2™+2'?+l)/(2m+2(;+2)^ 

>^ sup ini f( f\a*{T)- Sn{a*){T)fdT 

> Cn ■ n-(2"^+2'?+i)/(2™+2g+2)^ _^ I 

Convergence of the last probability to 1 follows from well-known results on 
optimal rates of convergence in nonparametric regression (cf. Stone [27]). 

6.5. Proof of Proposition 2. We first consider (3.11). The set {Hp}p>o 
constitutes an ordered linear smoother according to the definition in Kneip [20] 
Theorem 1 of Kneip [20] then implies that \MSEm{p*) - MSEm{popt)\ = 
Op{n~^/^ X MSEmiPopt)^^"^), where p* is determined by minimizing Mal- 
low's Cl, Cl{p) := ^||Y - HpYf + ^Tr(Hp). Note that although we 
consider centered values Yi — Y instead of Yi all arguments in Kneip [20] 
apply, since (y, . . . , y)'^X = 0. The arguments used in the proof of The- 
orem 1 of Kneip ([20], relations (A.17)-(A.22)) imply that for all p the 
difference Cl{p) - CLiPopt) - (MSEmip) - MSEm{popt)) can be bounded 
by exponential inequalities given in Lemma 3 of Kneip [20] [the squared 
norm (^^(Hp, Hp^pJ^ appearing in these inequalities can be bounded by 
2MSEm{p)]. These results lead to 

Clip) - CliPopt) = MSEmip) - MSEm{Popt) 

(6.22) 

+ V^mn-'"MSEm{pf/\ 
ASEM - ASEmiPopt) = MSEM - MSEm{popt) 

(6.23) 

+ V?inri-'^^MSEM'/^ 
(6.24) i||Y - HpYf = + MSEm{Popt) + rygU"'/', 

where r/p^L are random variables satisfying supp^g |^p;ln| = Op(l), s = 1, 2, 3. 
By our assumptions and the arguments used in the proof of Theorem 1 
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we can infer that n^iTr(Hp) = Op([npi/(2™+2'?+i)]-i) ^ ^^^i) fo^. p e 

as n ^ oo. Furthermore, there exists a constant D < oo such 
thatn-^Tr(Hp) <D-M6'£^^(p) = Op(p+[npi/(2m+2g+i)]-i^)^ Together with 

(6.24) a Taylor expansion of GCV m{p) with respect to n^^Tr(Hp) then 
yields 

1 iiv xjviiSio^iiv xj vii2Tr(Hp) 



GCF„,(p) = -||Y - HpYll^ + 2-||Y - H^Y 
J41 ^^(Hp 



(6.25) +77, 



n n n 

:r(H 

77, 

,[5] ( 



V 77 

Cl{p) + il(n-^2 + MSEM)^^ 



n 



where again 7/pfm are random variables with supp>„-2m+« |t/p^L| = Op(l), 
s = 4,5. Together with MSE^rn{popt) = Op(n-2'"+29+V(2m+2g+2))^ Relation 
(3.11) now is an immediate consequence of (6.22)-(6.25). 

Since Lemma 3 of Kneip [20] provides exponential inequalities, it is eas- 
ily verified that uniform bounds similar to (6.22)-(6.25) hold for all p £ 
[77~2'"+'5^ oo) and all ttt, = 1, . . . ,M„, if 77p^ln are replaced by r^pfm • logM„, 

s = 1, . . . , 5. Then supp>„-2m+«_^=^...^j.f^ |f/|,;m| = Op(l), s = 1, . . . , 5. The 
proof of (3.12) then follows the arguments used above. 



6.6. Proof of Theorem 4- Consider the following decomposition: 



P 



-1 



1 

np 



-W^Y 



7ip 



where 



np^ 



p 



np^ 



P 



T:=R-^I, 



and where S is the n x p matrix with generic element 6ij — 6j, i = 1, . . . ,n, 
j = 1, . . . ,p and the matrix R is defined in (4.3). Thus one obtains 



|qw -S||r„,j, < 



(6.26) 



np^ 



P 



np 



1 



SI — W^Y 

np 
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Note that Ee((^X'^X + ^A^) ^^^^^) = , whereas with assumptions 
(A.l) and (A.2) 

1 



np 



X"X+^A, 

P 



-1 



np 



n^p 



\np 



-1 



1 

np 



X^X( — X^X + pA, 

np 



\np \\np } , 

This leads with the properties of the eigenvalues of (— X'^X + pA^)"^ 



to 



(6.27) 



^X^X + ^A, 

nj>^ p 



-6^Y 



np 



Op 



{nppYl"^ ) 



The next step consists in studying the behavior of the matrix R defined in 



(4.3). Its generic term is 



Er=l (^i (ir- ) - X (tr ) ) (<5i. - 5. ) + (^^ (ts ) ■ 



y^{ts)){^%T - ^r) + (<5ir - ^T){^is - ^s) - crp[r = s], for r,s = 1, .. . ,p, so that 
for any u G M^* such that ||u|| = 1 one has ||Ee(Ru)|| = Op(^^) whereas it is 

easy to see that with assumptions (A.l) and (A.2) and (4.2), E^dlRup) = 
Op(^^) and then ||R|| = Op( ^i/2^ )- Now to derive an upper bound for 



the norm of the matrix T, we use the convergence result given in Gasser, 
Sroka and Jennen-Steinmetz [16] which in our framework implies that a| = 
ag + Op( ^i/2p )- Together with the order of ||R|| this yields 



(6.28) 



Op 



^l/2p 



For the second term in (6.26) we consider at first its Frobenius norm. We 
have 



S( — W^Y 

np 



< 



1 



p 



1/2 



AtX^X + ^A^ + T 



np^ 



P 



np'^ p 



-V 



1/2 



< 



P 



1/2 



^x^x + ^aJ 

np'^ p / 



1 

— W^Y 


2 


— WY 


-1 


||T|| 




/ np 




np 





where the second inequality comes from the first inequality in Demmel [11]. 
Note that with assumptions (A.2) and (A. 5), for every (5 > 0, there is a 



34 



C. CRAMBES, A. KNEIP AND P. SARDA 



positive constant such that p^''^||]E£(^W^Y)|| is greater than this constant 



with a probabihty larger than or equal to 1 — 6. We also have Eg 



I np 



Ee(^W^Y)||2), which is of order ^. This gives finally when combining 
(6.8), (6.28) and the condition on p and p as well as assumption (A. 2) 



(6.29) 



S— W^Y 

np 



Op 



S( — W^Y 

np 



Op{- 

.n 



which concludes Theorem 4 with (6.26) and (6.27). 

6.7. Proof of Theorem 5. We first prove (4.7). Obviously, 

2 " ~ 
||Sw - a||r„ < - '^idi,-w - dwf + 2||Sw - ol\\1^^^, 



1=1 



where 



di,w = / (Sw(t) - a{t))Xi{t)dt X!("w(ij) - a{tj))Xi{tj 

J I P 

Then, assertion (4.6) implies that (4.7) is a consequence of 



(6.30) 



1 



- y,{di,w - dwf = 0p{ + - . 



n 



i=l 



1 



1 



The proof of (6.30) follows the same structure as the proof of (6.6). Indeed, 
we have 



1 



n 



^^{diy^ - dwY 



i=l 



<2xi 



(6.31) 



\j=i ■ 



i,+l/(2p) 
t,-l/(2p) 



\P'{t)\ + \P{v\ + \r{t)\ + \rw\dt 



+ 2— IISw — SI 
P 



1 " P rtj+i/(2p) 



n 



EE 

i=lj=l-^<.-l/(2p) 



((X,(t) -X(t)) - iXiitj)-Xitj))fdt, 



where Pw(t) = Ez^o^ ir"w(0), rw(i) = Jo ^\ra-iy. Qw(^) du and P(t) and 
r(t) are similarly defined for 3 (see the proof of Theorem 2). 

Replacing the semi-norm p by the euclidean norm in (4.6) following 
the same lines as the proof of Theorem 4, one can show that 

(6.32) -||q;w-« 
P 



P = -(Sw — S)^(Sw — S) = Op 
P 



( 1 1 



\npp^ 



n 
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which together with assumption (A. 2) imphes that the second term on the 

— Ik —2k, 

right-hand side of (6.31) can be bounded by Op{^—;^ + ^ — 



Now the remainder of the proof consists in studying Jq a^\t)'^ dt. Re- 
cahing the definition of 3w, we have 

2 



Y - -Wqw 

P 



P J I P'^ 



< 



n 



WS 



P 



p 



+ -a^PmO. + p I S^*") {tf dt - ^a^a 



0"<5 



p^ 



and then 
P 

(6.33) 



ja^^\tfdt 



1 

< - 
n 



-W(Sw — OL, 

P 



1. 



1, 



1. 



+ -( Y - -WS, -WS - -WSw ) 

n\ p p p I 



- -SwPm^W + -S^P^S 

p p 

+ ^S^^Sw — ^S^S + /O / a*^™-* {if' dt. 
p^ P'^ Ji 

First consider the term ^||-W(Sw — S)|p. By (4.6) and (6.32) we obtain 



(6.34) 



-W(Sw — 

P 



Op 



1 1 
— + - 



, npp n , 

We focus now on the second term in the right-hand side of (6.33), for which 
we have the following decomposition: 

-/ Y - -WS, -WS - -WSw ) 



n 



p 

1 /l- 



p 



P 



-Xa - -WS, -WS - -WSw\ 

n \p p p p I 

+ i( d, -WS - -WSw ) + -( -WS - -WSw )• 



n 



We have 



n 



1/2 



P 



-Xa 



P 



n 



P 



P 



-WS 



P 



< 



n 



+ 



1/2 



P 



-Xa 



P 



1, 
P 



n 



1/2 



-WS-EJ -Xa 

P 

) 



WS 



p 



-Xa - -WS^ 

P P 
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Some straightforward calculations and previous results lead to ^^772 II ~ 
iWS - Ee(iXQ - |wa)|| = Op((l/npi/(2m+2g+i))i/2 ^ i/pi/2) whereas 
||Ee(iXa - |wa)|| = Op(/?^/2 + p-Ky -pj^jg finally leads with the Cauchy- 
Schwarz inequality to 

-/ -Xa - -W5, -W5 - -W5w ) 

n\p P P P I 

(6-35) = Op ( ( + ^ + + ^~ 

1 1 

+ 



Using again the Cauchy-Schwarz inequality and (6.34) we have 

(0.30) i(d.iwa-iwaw) = Op(^^ + g). 



The last term is such that 



-e^f-W(S-Sw)l 

n \p J 

-e^ (-W f ^X^X + ^ A^) 's^y) + f - WS f — ] . 

n \p \np'^ P J / n \p \np ) ) 



Using the same developments as above and using assumptions (A.l) and (A. 2) 
we obtain that i£-(lw(^X-X + £A^)-M^Y) = Op{^^;^^) while 

i£-(iWS X (^W-Y)) = Op(i). This finally leads to 

Finally using the same arguments as in the proof of Theorem 2, assertion 
(6.30) is a consequence of (6.31), (6.8) and (6.12) as well as the bounds 
obtained in (6.32)-(6.37) and the conditions on n, p and p. 

It remains to show (4.8). The proof follows the same lines as the proof of 
Theorem 3. We have the following relation: 



|aw - a||r„ 



1 " 



|3w-S||r + X!X!"w,raw,s -^TriT^i- Kiij = s) +Op{n ^), 



11 \ • 1 



with 5w,r = (Cr, Sw — 3) . Using the Cauchy-Schwarz inequality as in (6.21), 
the remainder of the proof consists in showing that ||2w ~ S|| = Op(l). This 
is obtained by using the bounds obtained in the proof of (4.7) and following 
the same lines of argument as for showing (6.8). 



FUNCTIONAL LINEAR REGRESSION 



37 



REFERENCES 

[1] Aneiros-Perez, G., Cardot, H., Estevez-Perez, G. and Vieu, P. (2004). Max- 
imum ozone concentration forecasting by functional nonparametric approaches. 
Envtronmetrics 15 675-685. 

[2] BOSQ, D. (2000). Linear Processes in Function Spaces. Lecture Notes in Statist. 149. 
Springer, New York. MR1783138 

[3] Cardot, H. (2000). Nonparametric estimation of smoothed principal components 
analysis of sampled noisy functions. J. Nonparametr. Statist. 12 503-538. 
MR1785396 

[4] Cai, T. T. and Hall, P. (2006). Prediction in functional linear regression. Ann. 
Statist. 34 2159-2179. MR2291496 

[5] Cardot, H., Crambes, C, Kneip, A. and Sarda, P. (2007). Smoothing sphnes es- 
timators in functional linear regression with errors-in-variables. Comput. Statist. 
Data Anal. 51 4832-4848. MR2364543 

[6] Cardot, H., Crambes, C. and Sarda, P. (2007). Ozone pollution forecasting. In 
Statistical Methods for Biostatistics and Related Fields (W. Hardle, Y. Mori and 
P. Vieu, eds.) 221-244. Springer, New York. MR2376412 

[7] Cardot, H., Ferraty, F. and Sarda, P. (2003). Spline estimators for the func- 
tional linear model. Statist. Sinica 13 571-591. MR1997162 

[8] Cardot, H., Mas, A. and Sarda, P. (2007). CLT in functional linear regression 
models. Prohah. Theory Related Fields 138 325-361. MR2299711 

[9] Chiou, J. M., MuLLER, H. G. and Wang, J. L. (2003). Functional quasi-likelihood 
regression models with smoothed random effects. J. Roy. Statist. Soc. Ser. B 65 
405-423. MR1983755 

[10] CUEVAS, A., Febrero, M. and Fraiman, R. (2002). Linear functional regression: 
The case of a fixed design and functional response. Canadian J. Statistics 30 
285-300. MR1926066 

[11] Demmel, ,]. (1992). The componentwise distance to the nearest singular matrix. 
SIAM J. Matrix Anal. Appl. 13 10-19. MR1146648 

[12] Filers, P. H. and Marx, B. D. (1996). Flexible smoothing with B-splines and 
penalties. Statist. Sci. 11 89-102. MR1435485 

[13] Eubank, R. L. (1988). Spline Smoothing and Nonparametric Regression. Dekker, 
New York. MR0934016 

[14] Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis: Meth- 
ods, Theory, Applications and Implementations. Springer, London. MR2229687 

[15] Fuller, W. A. (1987). Measurement Error Models. Wiley, New York. MR0898653 

[16] Gasser, T., Sroka, L. and Jennen-Steinmetz, C. (1986). Residual variance and 
residual pattern in nonlinear regression. Biometrika 3 625-633. MR0897854 

[17] GOLUB, G. H. and Van Loan, C. F. (1980). An analysis of the total least squares 
problem. SIAM J. Numer. Anal. 17 883-893. MR0595451 

[18] Hall, P. and Horowitz, J. L. (2007). Methodology and convergence rates for func- 
tional linear regression. Ann. Statist. To appear. MR2332269 

[19] He, G., Muller, H.-G. and Wang, J. L. (2000). Extending correlation and re- 
gression from multivariate to functional data. In Asymptotics in Statistics and 
Probability (M. L. Puri, ed.) 301-315. VSP, Leiden. 

[20] Kneip, A. (1994). Ordered linear smoothers. Ann. Statist. 22 835-866. MR1292543 

[21] Li, Y. and HsiNG, T. (2006). On rates of convergence in functional linear regres- 
sion. J. Mulitwariate Anal. Published online DOI: 10.1016/j.jmva.2006.10.004. 
MR2392433 



38 



C. CRAMBES, A. KNEIP AND P. SARDA 



[22] Marx, B. D. and Eilers, P. H. (1999). Generalized linear regression on sampled 

signals and curves: A P-spline approach. Technometrics 41 1-13. 
[23] MuLLER, H.-G. and Stadtmuller, U. (2005). Generalized functional linear models. 

Annn. Statist. 33 774-805. MR2163159 
[24] Ramsay, J. O. and Dalzell, C. J. (1991). Some tools for functional data analysis. 

J. Roy. Statist. Soc. Ser. B 53 539-572. MR1125714 
[25] Ramsay, J. O. and Silverman, B. W. (2002). Applied Functional Data Analysis. 

Springer, New York. MR1910407 
[26] Ramsay, J. O. and Silverman, B. W. (2005). Applied Functional Data Analysis, 

2nd ed. Springer, New York. MR2168993 
[27] Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regres- 
sion. Ann. Statist. 10 1040-1053. MR0673642 
[28] Utreras, F. (1983). Natural spline functions, their associated eigenvalue problem. 

Numer. Math. 42 107-117. MR0716477 
[29] Van Huffel, S. and Vandewalle, J. (1991). The Total Least Squares Problem: 

Computational Aspects and Analysis. SIAM, Philadelphia. MR1118607 
[30] Wahba, G. (1977). Practical approximate solutions to linear operator equations when 

the data are noisy. SIAM J. Numer. Anal. 14 651-667. MR0471299 
[31] Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia. 

MR1045442 

[32] Yao, p., Muller, H.-G. and Wang, J. L. (2005). Functional data analysis for 
sparse longitudinal data. J. Amer. Statist. Assoc. 100 577-590. MR2160561 



C. Crambes 
P. Sarda 

Universite Paul Sabatier 
Institut de Mathematiques 
UMR 5219 

Laboratoire de Statistique et Probabilites 
118 Route de Narbonne 
31062 Toulouse Cedex 
France 

E-MAIL: Christophe.Crambes@math.ups-tlse.fr 
Pascal.Sarda@math.ups-tIse.fr 



A. Kneip 

Statistische Abteilung 

Department of Economics and Hausdorff 

Center for Mathematics 

Universitat Bonn 

Adenauerallee 24-26 

53113 Bonn 

Germany 

E-mail: akneip@uni-bonn.de 



